Tutorial on Microbiome Data Analysis
This tutorial gets You started with R tools for microbial ecology. In particular, to provide an introduction to
- R tools for microbial ecology
- Role of custom data formats and tools in data analytical workflows
- Reproducible document generation
- Possibilities and challenges in population-level microbiome profiling studies
The tutorial slides are available here
Installation
Launch R/RStudio and install the microbiome R package (see installation instructions).
To initiate reproducible documentation, do the following in RStudio:
- Open a new Rmarkdown (.html) file
- Convert that .html file with the ‘knit’ button
- Modify the file and knit again to make your own reproducible report
Example data: Intestinal microbiota of 1006 Western adults
Example data set will be the HITChip Atlas, which is available via the microbiome R package in phyloseq format. This data set from Lahti et al. Nat. Comm. 5:4344, 2014 comes with 130 genus-like taxonomic groups across 1006 western adults with no reported health complications. Some subjects have also short time series. Load the data in R with:
# Data citation doi: 10.1038/ncomms5344
library(microbiome)
library(knitr)
data(atlas1006)
print(atlas1006)## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 130 taxa and 1151 samples ]
## sample_data() Sample Data: [ 1151 samples by 10 sample variables ]
## tax_table() Taxonomy Table: [ 130 taxa by 2 taxonomic ranks ]
Phyloseq data stucture for taxonomic profiling
Explore the phyloseq data format. See examples on microbiome data manipulation.
Diversity
Explore the estimation and analysis of various diversity indices and taxonomic composition.
| observed | <<<<<<< HEADchao1 | diversity_inverse_simpson | diversity_gini_simpson | diversity_shannon | diversity_fisher | diversity_coverage | ||||||| merged common ancestorsdiversities_inverse_simpson | diversities_gini_simpson | diversities_shannon | diversities_fisher | diversities_coverage | =======chao1 | diversities_inverse_simpson | diversities_gini_simpson | diversities_shannon | diversities_fisher | diversities_coverage | >>>>>>> 371ff05b569a6264d9798b347807497f5edc6dbdevenness_camargo | evenness_pielou | evenness_simpson | evenness_evar | evenness_bulla | dominance_dbp | dominance_dmn | dominance_absolute | dominance_relative | dominance_simpson | dominance_core_abundance | dominance_gini | rarity_log_modulo_skewness | rarity_low_abundance | rarity_noncore_abundance | rarity_rare_abundance | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample-1 | 99 | 108.3750 | 12.983930 | 0.9229817 | 3.187815 | 16.07126 | 5 | 0.3675967 | 0.3567791 | 0.1311508 | 0.1669647 | 0.3447496 | 0.1759515 | 0.3379428 | 1336 | 0.1759515 | 0.0770183 | 0.9998683 | 0.8488881 | 2.059488 | 0.0264718 | 0.0001317 | 0.0001317 | |||||||||||
| Sample-2 | 98 | 108.5625 | 16.595578 | 0.9397430 | 3.394462 | 15.04043 | 7 | 0.3437543 | 0.3679621 | 0.1693426 | 0.1483521 | 0.3958182 | 0.1716594 | 0.2821246 | 1742 | 0.1716594 | 0.0602570 | 1.0000000 | 0.8188881 | 2.055369 | 0.0198069 | 0.0000000 | 0.0000000 | |||||||||||
| Sample-3 | 99 | 110.2500 | 8.703493 | 0.8851036 | 2.864855 | 16.26890 | 4 | 0.3242763 | 0.3229022 | 0.0879141 | 0.1906035 | 0.2680793 | 0.2793437 | 0.3985416 | 1992 | 0.2793437 | 0.1148964 | 0.9998598 | 0.8806975 | 2.053348 | 0.0412284 | 0.0000000 | 0.0000000 | |||||||||||
| Sample-4 | 100 | 111.2500 | 10.709023 | 0.9066208 | 3.056922 | 15.21763 | 4 | 0.3924486 | 0.3289708 | 0.1070902 | 0.1559785 | 0.3336218 | 0.1957623 | 0.3813911 | 2125 | 0.1957623 | 0.0933792 | 0.9999079 | 0.8604110 | 2.057767 | 0.0247812 | 0.0000921 | 0.0000921 | |||||||||||
| Sample-5 | 98 | 112.0625 | 12.248008 | 0.9183541 | 3.073742 | 14.59865 | 4 | 0.4112500 | 0.3272493 | 0.1249797 | 0.1429785 | 0.3194244 | 0.1686667 | 0.3334167 | 2024 | 0.1686667 | 0.0816459 | 0.9999167 | 0.8672179 | 2.057402 | 0.0234167 | 0.0000000 | 0.0000000 | |||||||||||
| Sample-6 | 99 | 113.0625 | 9.665190 | 0.8965359 | 2.941993 | 15.94374 | 3 | 0.2819936 | 0.3277479 | 0.0976282 | 0.1824172 | 0.3030473 | 0.2273187 | 0.3802123 | 1799 | 0.2273187 | 0.1034641 | 1.0000000 | 0.8735717 | 2.060100 | 0.0380339 | 0.0000000 | 0.0000000 |
Assign the estimated diversity to sample metadata
<<<<<<< HEADsample_data(pseq)$diversity <- tab$diversity_shannonsample_data(pseq)$diversity <- tab$diversities_shannonVisualize the data
Technical biases
Explore potential technical biases in the data. DNA extraction method has a remarkable effect on sample grouping.
<<<<<<< HEAD# Use relative abundance data
ps <- microbiome::transform(pseq, "compositional")
# Pick core taxa
ps <- core(ps, detection = 0, prevalence = 80/100)
# For this example, choose samples with DNA extraction information available
ps <- subset_samples(ps, !is.na(DNA_extraction_method))
# Illustrate sample similarities with PCoA (NMDS)
plot_landscape(ps, "NMDS", "bray", col = "DNA_extraction_method")# Use relative abundance data
ps <- microbiome::transform(pseq, "compositional")
# For this example, choose samples with DNA extraction information available
ps <- subset_samples(ps, !is.na(DNA_extraction_method))
# Or: you could focus on a single DNA extraction method
# ps <- subset_samples(ps, DNA_extraction_method == "r")
# Illustrate sample similarities with PCoA (NMDS)
plot_landscape(ps, "NMDS", "bray", col = "DNA_extraction_method")# Use relative abundance data
ps <- microbiome::transform(pseq, "compositional")
# For this example, choose samples with DNA extraction information available
ps <- subset_samples(ps, !is.na(DNA_extraction_method))
# Or: you could focus on a single DNA extraction method
# ps <- subset_samples(ps, DNA_extraction_method == "r")
# Illustrate sample similarities with PCoA (NMDS)
plot_landscape(ps, "NMDS", "bray", col = "DNA_extraction_method")Core microbiota
Core microbiota refers to the set of species shared by (almost) all individuals.
A full phyloseq object with just the core taxa is obtained as follows:
# Transform to compositional abundances
pseq.rel <- microbiome::transform(pseq, "compositional")
# Pick the core (>0.1% relative abundance in >50% of the samples)
pseq.core <- core(pseq.rel, detection = 0.1/100, prevalence = 50/100)Visualizing the core
<<<<<<< HEAD# Core with compositionals:
prevalences <- seq(.05, 1, .05)
detections <- 10^seq(log10(1e-3), log10(.2), length = 10)
p <- plot_core(pseq.rel, plot.type = "heatmap",
prevalences = prevalences, detections = detections) +
xlab("Detection Threshold (Relative Abundance)")
print(p) # Core with compositionals:
prevalences <- seq(.05, 1, .05)
detections <- 10^seq(log10(1e-3), log10(.2), length = 10)
p <- plot_core(pseq.rel, plot.type = "heatmap",
prevalences = prevalences, detections = detections) +
xlab("Detection Threshold (Relative Abundance)")
print(p) # Core with compositionals:
prevalences <- seq(.05, 1, .05)
detections <- 10^seq(log10(1e-3), log10(.2), length = 10)
p <- plot_core(pseq.rel, plot.type = "heatmap",
prevalences = prevalences, detections = detections) +
xlab("Detection Threshold (Relative Abundance)")
print(p) Other tools
Explore further tools in microbiome tutorial.